Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroups/systemd: add cgroup-v2 path to the list when using hybrid mode #2087

Conversation

mauriciovasquezbernal
Copy link
Contributor

Currently, the parent process of the container is moved to the right cgroup-v2 tree when systemd is using a hybrid model (last line with 0::):

$ runc --systemd-cgroup run myid
/ # cat /proc/self/cgroup
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/system.slice/runc-myid.scope

However, if a second process is executed in the same container, it is not moved to the right cgroup-v2 tree:

$ runc exec myid /bin/sh -c 'cat /proc/self/cgroup'
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/user.slice/user-1000.slice/session-8.scope

Having the processes of the container in its own cgroup-v2 is useful for any BPF programs that rely on bpf_get_current_cgroup_id(), like https://github.com/kinvolk/inspektor-gadget/ for instance.

This commit makes that processes executed with exec are placed into the right cgroup-v2 tree. The implementation checks if systemd is using a hybrid mode (by checking if cgroups-v2 is mounted in /sys/fs/cgroup/unified), if yes, the path of the cgroup-v2 slice for this container is saved into the cgroup path list.

dongsupark pushed a commit to flatcar-archive/coreos-overlay that referenced this pull request Aug 2, 2019
Fix an issue of cgroup v2 path being incorrectly configured when using
the hybrid mode in systemd.

Bump docker-runc to 1.0.0-rc8-r2 as well.

See also opencontainers/runc#2087
@mauriciovasquezbernal
Copy link
Contributor Author

Hello folks, any thoughts about this one?

Thanks!

/cc @filbranden, @crosbymichael @giuseppe @cyphar

@dongsupark
Copy link

@cyphar
It's what I mentioned during the talk in the hallway.

@mauriciovasquezbernal
Copy link
Contributor Author

@AkihiroSuda I changed the names according to your suggestion.

@AkihiroSuda
Copy link
Member

Please squash commits and sign

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/pr/add_cgroupv2_path branch from 1485233 to c33ba15 Compare March 10, 2020 14:47
@AkihiroSuda
Copy link
Member

AkihiroSuda commented Mar 10, 2020

LGTM

Approved with PullApprove

@AkihiroSuda
Copy link
Member

@opencontainers/runc-maintainers PTAL

@AkihiroSuda
Copy link
Member

@mrunalp @kolyshkin PTAL

@kolyshkin
Copy link
Contributor

Should we ever try to support hybrid mode? If yes, it should at least be tested. Currently, we barely test v2.

@dongsupark
Copy link

FYI, I have just realized that this PR needs a rebase.
Apparently code around cgroups/systemd has changed recently with v1.0.0-rc91.

@kolyshkin
Copy link
Contributor

I think we don't want to support this two-headed beast of hybrid mode, and recommend using cgroup v2 unified instead. IOW close this.

@opencontainers/runc-maintainers WDYT?

@cyphar
Copy link
Member

cyphar commented Feb 3, 2021

Yeah supporting hybrid mode will only lead to madness. cgroupv2 now supports all of the major controllers so hybrid mode doesn't really make sense anymore. I think we should close this.

@AkihiroSuda
Copy link
Member

I agree that hybrid mode is meaningless for controllers like cpu and memory, but it is still useful for eBPF stuff.
So I think it wouldn't hurt to merge this if somebody can rebase this.

Anyway, we will need to close this if this remains unrebased.

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/pr/add_cgroupv2_path branch 3 times, most recently from cecb69d to b9df855 Compare February 5, 2021 20:30
@mauriciovasquezbernal
Copy link
Contributor Author

@AkihiroSuda @kolyshkin @cyphar I rebased this.

@cyphar cyphar requested a review from kolyshkin February 7, 2021 05:19
@cyphar
Copy link
Member

cyphar commented Feb 7, 2021

@kolyshkin Does this change make sense given all of the changes you made last year to how subsystem paths are handled in cgroupfs?

@@ -116,6 +116,12 @@ func FindCgroupMountpoint(cgroupPath, subsystem string) (string, error) {
return "", errUnified
}

// If subsystem is empty it means that we are looking for the
// cgroups2 path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "cgroup2 hybrid path" instead.

@kolyshkin
Copy link
Contributor

Looks like it still makes sense. Two things

  • fs cgroup driver is not fixed wrt hybrid mode;
  • I don't like the hack of using empty subsystem to denote the hybrid path much. Perhaps it can be better implemented in fs driver by adding NameGroup{Name: "hybrid", Join: yes"} to the list of subsystems, and modifying other places to support a special value of "hybrid".

Finally, if we're doing it, we need a test case (apparently ubuntu 20.04 which is used on GHA is a good candidate for that).

@kolyshkin
Copy link
Contributor

(CI failure on centos7 is unrelated; it's #2760)

@AkihiroSuda
Copy link
Member

I don't like the hack of using empty subsystem to denote the hybrid path much. Perhaps it can be better implemented in fs driver by adding NameGroup{Name: "hybrid", Join: yes"} to the list of subsystems, and modifying other places to support a special value of "hybrid".

The empty string corresponds to /proc/PID/cgroup . Also, v1 named group has nothing to do with the hybrid hierarchy.

@kolyshkin
Copy link
Contributor

The empty string corresponds to /proc/PID/cgroup

Yes, I understand that it comes from there.

Also, v1 named group has nothing to do with the hybrid hierarchy.

... and yet we put the path to it into a map of paths for v1 controllers.

Anyway, I'm fine with the current approach -- it's a hack but at least it is documented in the code.

The other two issues remain:

  • fs cgroup driver is not fixed wrt hybrid mode;
  • need a simple test case.

Currently the parent process of the container is moved to the right
cgroup v2 tree when systemd is using a hybrid model (last line with 0::):

$ runc --systemd-cgroup run myid
/ # cat /proc/self/cgroup
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/system.slice/runc-myid.scope

However, if a second process is executed in the same container, it is
not moved to the right cgroup v2 tree:

$ runc exec myid /bin/sh -c 'cat /proc/self/cgroup'
12:cpuset:/system.slice/runc-myid.scope
11:blkio:/system.slice/runc-myid.scope
10:devices:/system.slice/runc-myid.scope
9:hugetlb:/system.slice/runc-myid.scope
8:memory:/system.slice/runc-myid.scope
7:rdma:/
6:perf_event:/system.slice/runc-myid.scope
5:net_cls,net_prio:/system.slice/runc-myid.scope
4:freezer:/system.slice/runc-myid.scope
3:pids:/system.slice/runc-myid.scope
2:cpu,cpuacct:/system.slice/runc-myid.scope
1:name=systemd:/system.slice/runc-myid.scope
0::/user.slice/user-1000.slice/session-8.scope

This commit makes that processes executed with exec are placed into the
right cgroup v2 tree. The implementation checks if systemd is using a
hybrid mode (by checking if cgroups v2 is mounted in
/sys/fs/cgroup/unified), if yes, the path of the cgroup v2 slice for
this container is saved into the cgroup path list.

The fs group driver has a similar issue, in this case none of the runc
run or runc exec commands put the process in the right cgroups v2. This
commit also fixes that.

Having the processes of the container in its own cgroup v2 is useful
for any BPF programs that rely on bpf_get_current_cgroup_id(), like
https://github.com/kinvolk/inspektor-gadget/ for instance.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/pr/add_cgroupv2_path branch from b9df855 to bfd64a7 Compare February 17, 2021 20:57
Check that runc run and runc exec put the process on the same cgroups v2
when using hybrid mode.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/pr/add_cgroupv2_path branch from bfd64a7 to c64eb7f Compare February 17, 2021 21:14
@mauriciovasquezbernal
Copy link
Contributor Author

@kolyshkin I added a test and also implemented for the fs controller. I agree that using an empty string is not the cleanest solution, I tried to implement it in another way but I hit some issues. Unfortunately I don't have that much time to continue investigating other ways to implement this.

@kolyshkin
Copy link
Contributor

@mauriciovasquezbernal I am carrying this one in #3059 but this one can still be merged first (once rebased).

Note that the last patch needs to put cgroup_hybrid before cgroup_* (added by commit 44fcbfd).

@kolyshkin
Copy link
Contributor

@mauriciovasquezbernal I am carrying this one in #3059 but this one can still be merged first (once rebased).

Well, I'm actually wrong. This needs criu-3.16 which is still not released.

@mauriciovasquezbernal
Copy link
Contributor Author

It was a very long one! Thanks a lot for driving it forward @kolyshkin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants